How Knowledge Graphs Bring Order to the HRA’s Data Diversity
It takes a lot of data to construct the Human Reference Atlas (HRA).
But not all data is created equal!
HRA data comes from many different sources who may use different technologies and
follow different protocols.
The data itself comes in many different formats, some of which may require a
particular code to read.
Some of it is old data, and some of it is new.
It may have been mixed with other data or repurposed.
Some data might be open research data, available to all.
While some data might have restrictions that limit access, use, and distribution.
For the Human Reference Atlas (HRA), we need the ability to easily find the data we
want, utilize it for our purposes, and share it as widely as possible.
Of course, that data needs to be structured in a way that it can be readable by
machines.
Ideally, though, that data structure would also be understandable to humans.
It would not only show what data exists in the HRA but also how pieces of that data
relate to each other.
By labeling our data and connecting our labeled nodes with relational
links,
we put our data into context and create a framework for moving from data
to knowledge
to insight.
The type of data structure we are moving towards here is known as a “knowledge
graph,” and they are a lot more common than you think.
Google was the first to introduce the term back in 2012.
But now major companies like
Facebook, Amazon, and Netflix—all utilize knowledge graphs to represent relationships between people,
products, and concepts.
A knowledge graph gathers all the things that are important to a particular group or
organization.
These things can be people, places, entities, concepts, databases,
documents—really just about anything.
Each of those data entities is assigned a node. Then, it organizes all
those things into a network of interrelations.
A knowledge graph gathers all the things that are important to a particular group or
organization.
Using the Resource Description Framework (RDF), each of these are expressed as
a subject, predicate, and an object.
The predicate expressing the relationship between
the entities.
This grouping is called a triple, and the relation between an anatomical structure and
its parent organ might look like this.
Let's see how this might look for a particular digital object created for the Human
Reference Atlas.
Here's a 3D reference organ for the left female kidney.
And her's how it appears in the knowledge graph.
The "subject" entries in the left column all point to the same thing:
the HRA's 3D reference organ of the left female kidney.
The "object" column lists all the other data in the HRA that the reference organ is
connected to.
And the "predicate" column indicates the nature of that relationship.
A closer look at these predicates reveals relationships such as the creation date,
version number, the raw data the 3D kidney was derived from, and many more.
What we see here is actually a network of nodes and edges, with our kidney reference
organ as the central node with all its related data connected to it by labeled edges.
Of course, this is only one network. There are over 500 digital objects currently in
the HRA, each with its own network. And each network is connected to all the others.
Utilizing a knowledge graph not only helps us structure the massive amount of
different types of data that power the HRA.
It will also allow us to link up with other information
networks to create a wide and radically open web of knowledge about the human body.